A Statistical Analysis of the TREC-3 Data
نویسندگان
چکیده
A statistical analysis of the TREC-3 data shows that performance differences across queries is greater than performance differences across participant runs. Generally, groups of runs which do not differ significantly at large, sometimes accounting for over half the runs. Correlation among the various performance measures is high.
منابع مشابه
Finding Opinionated Blogs Using Statistical Classifiers and Lexical Features
This paper systematically exploited various lexical features for opinion analysis on blog data using a statistical learning framework. Our experimental results using the TREC Blog track data show that all the features we explored effectively represent opinion expressions, and different classification strategies have a significant impact on opinion classification performance. We also present res...
متن کاملContent Locality in Time-Ordered Document Collections
Using newswire data sources from the TREC corpus, we show that the distribution of relevant documents with respect to time can be decidely non-uniform. Many TREC topics show timebased clustering of relevant documents. We denote this clustering content locality and provide a simple metric for its measurement in time-ordered document collections. There is a marked positive correlation between con...
متن کاملEstimating the Number of Relevant Documents in Enormous Collections
In assessing information retrieval systems, it is important to know not only the precision of the retrieved set, but also to compare the number of retrieved relevant items to the total number of relevant items. For large collections, such as the TREC test collections, or the World Wide Web, it is not possible to enumerate the entire set of relevant documents. If the retrieved documents are eval...
متن کاملIndian Statistical Institute, Kolkata at TREC 2010: Legal Interactive
Indian Statistical Institute, Kolkata participated in TREC for the first time this year. We participated in TREC Legal Interactive task in two topics namely, Topic 301 and Topic 302. We reduced the size of the corpus by Boolean retrieval using Lemur 4.11 and followed it by a clustering technique. We chose members from each cluster (which we called seeds) for relevance judgement by the TA and as...
متن کاملQuery clustering and IR system detection. Experiments on TREC data
Variability in IR has been little considered as a way to improve system performance. In this paper, we consider linguistic variability of queries as a clue to predict which system will perform better for a particular query. More precisely, we cluster TREC topics with regard to 16 linguistic features. To each cluster is then associated a system that will be used to proceed all the queries belong...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1994